119 research outputs found
Toward Efficient Automated Feature Engineering
Automated Feature Engineering (AFE) refers to automatically generate and
select optimal feature sets for downstream tasks, which has achieved great
success in real-world applications. Current AFE methods mainly focus on
improving the effectiveness of the produced features, but ignoring the
low-efficiency issue for large-scale deployment. Therefore, in this work, we
propose a generic framework to improve the efficiency of AFE. Specifically, we
construct the AFE pipeline based on reinforcement learning setting, where each
feature is assigned an agent to perform feature transformation \com{and}
selection, and the evaluation score of the produced features in downstream
tasks serve as the reward to update the policy. We improve the efficiency of
AFE in two perspectives. On the one hand, we develop a Feature Pre-Evaluation
(FPE) Model to reduce the sample size and feature size that are two main
factors on undermining the efficiency of feature evaluation. On the other hand,
we devise a two-stage policy training strategy by running FPE on the
pre-evaluation task as the initialization of the policy to avoid training
policy from scratch. We conduct comprehensive experiments on 36 datasets in
terms of both classification and regression tasks. The results show
higher performance in average and 2x higher computational efficiency comparing
to state-of-the-art AFE methods
Effect of nonlinear and noncollinear transformation strain pathways in phase-field modeling of nucleation and growth during martensite transformation
The phase-field microelasticity theory has exhibited great capacities in studying elasticity and its effects on microstructure evolution due to various structural and chemical non-uniformities (impurities and defects) in solids. However, the usually adopted linear and/or collinear coupling between eigen transformation strain tensors and order parameters in phase-field microelasticity have excluded many nonlinear transformation pathways that have been revealed in many atomistic calculations. Here we extend phase-field microelasticity by adopting general nonlinear and noncollinear eigen transformation strain paths, which allows for the incorporation of complex transformation pathways and provides a multiscale modeling scheme linking atomistic mechanisms with overall kinetics to better describe solid-state phase transformations. Our case study on a generic cubic to tetragonal martensitic transformation shows that nonlinear transformation pathways can significantly alter the nucleation and growth rates, as well as the configuration and activation energy of the critical nuclei. It is also found that for a pure-shear martensitic transformation, depending on the actual transformation pathway, the nuclei and austenite/martensite interfaces can have nonzero far-field hydrostatic stress and may thus interact with other crystalline defects such as point defects and/or background tension/compression field in a more profound way than what is expected from a linear transformation pathway. Further significance is discussed on the implication of vacancy clustering at austenite/martensite interfaces and segregation at coherent precipitate/matrix interfaces.National Science Foundation (U.S.). Division of Materials Research (DMR-1410322)National Science Foundation (U.S.). Division of Materials Research (DMR-1410636
Deep Generative Imputation Model for Missing Not At Random Data
Data analysis usually suffers from the Missing Not At Random (MNAR) problem,
where the cause of the value missing is not fully observed. Compared to the
naive Missing Completely At Random (MCAR) problem, it is more in line with the
realistic scenario whereas more complex and challenging. Existing statistical
methods model the MNAR mechanism by different decomposition of the joint
distribution of the complete data and the missing mask. But we empirically find
that directly incorporating these statistical methods into deep generative
models is sub-optimal. Specifically, it would neglect the confidence of the
reconstructed mask during the MNAR imputation process, which leads to
insufficient information extraction and less-guaranteed imputation quality. In
this paper, we revisit the MNAR problem from a novel perspective that the
complete data and missing mask are two modalities of incomplete data on an
equal footing. Along with this line, we put forward a generative-model-specific
joint probability decomposition method, conjunction model, to represent the
distributions of two modalities in parallel and extract sufficient information
from both complete data and missing mask. Taking a step further, we exploit a
deep generative imputation model, namely GNR, to process the real-world missing
mechanism in the latent space and concurrently impute the incomplete data and
reconstruct the missing mask. The experimental results show that our GNR
surpasses state-of-the-art MNAR baselines with significant margins (averagely
improved from 9.9% to 18.8% in RMSE) and always gives a better mask
reconstruction accuracy which makes the imputation more principle
Dish-TS: A General Paradigm for Alleviating Distribution Shift in Time Series Forecasting
The distribution shift in Time Series Forecasting (TSF), indicating series
distribution changes over time, largely hinders the performance of TSF models.
Existing works towards distribution shift in time series are mostly limited in
the quantification of distribution and, more importantly, overlook the
potential shift between lookback and horizon windows. To address above
challenges, we systematically summarize the distribution shift in TSF into two
categories. Regarding lookback windows as input-space and horizon windows as
output-space, there exist (i) intra-space shift, that the distribution within
the input-space keeps shifted over time, and (ii) inter-space shift, that the
distribution is shifted between input-space and output-space. Then we
introduce, Dish-TS, a general neural paradigm for alleviating distribution
shift in TSF. Specifically, for better distribution estimation, we propose the
coefficient net (CONET), which can be any neural architectures, to map input
sequences into learnable distribution coefficients. To relieve intra-space and
inter-space shift, we organize Dish-TS as a Dual-CONET framework to separately
learn the distribution of input- and output-space, which naturally captures the
distribution difference of two spaces. In addition, we introduce a more
effective training strategy for intractable CONET learning. Finally, we conduct
extensive experiments on several datasets coupled with different
state-of-the-art forecasting models. Experimental results show Dish-TS
consistently boosts them with a more than 20% average improvement. Code is
available.Comment: Accepted by AAAI 202
Reinforced Imitative Graph Learning for Mobile User Profiling
Mobile user profiling refers to the efforts of extracting users’ characteristics from mobile activities. In order to capture the dynamic varying of user characteristics for generating effective user profiling, we propose an imitation-based mobile user profiling framework. Considering the objective of teaching an autonomous agent to imitate user mobility based on the user’s profile, the user profile is the most accurate when the agent can perfectly mimic the user behavior patterns. The profiling framework is formulated into a reinforcement learning task, where an agent is a next-visit planner, an action is a POI that a user will visit next, and the state of the environment is a fused representation of a user and spatial entities. An event in which a user visits a POI will construct a new state, which helps the agent predict users’ mobility more accurately. In the framework, we introduce a spatial Knowledge Graph (KG) to characterize the semantics of user visits over connected spatial entities. Additionally, we develop a mutual-updating strategy to quantify the state that evolves over time. Along these lines, we develop a reinforcement imitative graph learning framework for mobile user profiling. Finally, we conduct extensive experiments to demonstrate the superiority of our approach
Boosting Urban Traffic Speed Prediction via Integrating Implicit Spatial Correlations
Urban traffic speed prediction aims to estimate the future traffic speed for
improving the urban transportation services. Enormous efforts have been made on
exploiting spatial correlations and temporal dependencies of traffic speed
evolving patterns by leveraging explicit spatial relations (geographical
proximity) through pre-defined geographical structures ({\it e.g.}, region
grids or road networks). While achieving promising results, current traffic
speed prediction methods still suffer from ignoring implicit spatial
correlations (interactions), which cannot be captured by grid/graph
convolutions. To tackle the challenge, we propose a generic model for enabling
the current traffic speed prediction methods to preserve implicit spatial
correlations. Specifically, we first develop a Dual-Transformer architecture,
including a Spatial Transformer and a Temporal Transformer. The Spatial
Transformer automatically learns the implicit spatial correlations across the
road segments beyond the boundary of geographical structures, while the
Temporal Transformer aims to capture the dynamic changing patterns of the
implicit spatial correlations. Then, to further integrate both explicit and
implicit spatial correlations, we propose a distillation-style learning
framework, in which the existing traffic speed prediction methods are
considered as the teacher model, and the proposed Dual-Transformer
architectures are considered as the student model. The extensive experiments
over three real-world datasets indicate significant improvements of our
proposed framework over the existing methods
Expert Knowledge-Guided Length-Variant Hierarchical Label Generation for Proposal Classification
To advance the development of science and technology, research proposals are
submitted to open-court competitive programs developed by government agencies
(e.g., NSF). Proposal classification is one of the most important tasks to
achieve effective and fair review assignments. Proposal classification aims to
classify a proposal into a length-variant sequence of labels. In this paper, we
formulate the proposal classification problem into a hierarchical multi-label
classification task. Although there are certain prior studies, proposal
classification exhibit unique features: 1) the classification result of a
proposal is in a hierarchical discipline structure with different levels of
granularity; 2) proposals contain multiple types of documents; 3) domain
experts can empirically provide partial labels that can be leveraged to improve
task performances. In this paper, we focus on developing a new deep proposal
classification framework to jointly model the three features. In particular, to
sequentially generate labels, we leverage previously-generated labels to
predict the label of next level; to integrate partial labels from experts, we
use the embedding of these empirical partial labels to initialize the state of
neural networks. Our model can automatically identify the best length of label
sequence to stop next label prediction. Finally, we present extensive results
to demonstrate that our method can jointly model partial labels, textual
information, and semantic dependencies in label sequences, and, thus, achieve
advanced performances.Comment: 10 pages, Accepted as regular paper by ICDM 202
- …